Unsupervised Transduction Grammar Induction via Minimum Description Length
نویسندگان
چکیده
We present a minimalist, unsupervised learning model that induces relatively clean phrasal inversion transduction grammars by employing the minimum description length principle to drive search over a space defined by two opposing extreme types of ITGs. In comparison to most current SMT approaches, the model learns a very parsimonious phrase translation lexicons that provide an obvious basis for generalization to abstract translation schemas. To do this, the model maintains internal consistency by avoiding use of mismatched or unrelated models, such as word alignments or probabilities from IBM models. The model introduces a novel strategy for avoiding the pitfalls of premature pruning in chunking approaches, by incrementally splitting an ITGwhile using a second ITG to guide this search.
منابع مشابه
Unsupervised Learning of Bilingual Categories in Inversion Transduction Grammar Induction
We present the first known experiments incorporating unsupervised bilingual nonterminal category learning within end-to-end fully unsupervised transduction grammar induction using matched training and testing models. Despite steady recent progress, such induction experiments until now have not allowed for learning differentiated nonterminal categories. We divide the learning into two stages: (1...
متن کاملLearning Bilingual Categories in Unsupervised Inversion Transduction Grammar Induction
We present the first known experiments incorporating unsupervised bilingual nonterminal category learning within end-to-end fully unsupervised transduction grammar induction using matched training and testing models. Despite steady recent progress, such induction experiments until now have not allowed for learning differentiated nonterminal categories. We divide the learning into two stages: (1...
متن کاملIterative Rule Segmentation under Minimum Description Length for Unsupervised Transduction Grammar Induction
We argue that for purely incremental unsupervised learning of phrasal inversion transduction grammars, a minimum description length driven, iterative top-down rule segmentation approach that is the polar opposite of Saers, Addanki, and Wu’s previous 2012 bottom-up iterative rule chunking model yields significantly better translation accuracy and grammar parsimony. We still aim for unsupervised ...
متن کاملCombining Top-down and Bottom-up Search for Unsupervised Induction of Transduction Grammars
We show that combining both bottom-up rule chunking and top-down rule segmentation search strategies in purely unsupervised learning of phrasal inversion transduction grammars yields significantly better translation accuracy than either strategy alone. Previous approaches have relied on incrementally building larger rules by chunking smaller rules bottomup; we introduce a complementary top-down...
متن کاملSegmenting vs. Chunking Rules: Unsupervised ITG Induction via Minimum Conditional Description Length
We present an unsupervised learning model that induces phrasal inversion transduction grammars by introducing a minimum conditional description length (CDL) principle to drive search over a space defined by two opposing extreme types of ITGs. Our approach attacks the difficulty of acquiring more complex longer rules when inducing inversion transduction grammars via unsupervised bottom-up chunki...
متن کامل